Hello there!
My name is Mick O Donnell and I will take you through a tour of what RMarkdown can do for your work or research with R.
RMarkdown is an extremely powerful, yet simple to implement micro-language which enables the mixing of analysis code with a narrative which can explain the code.
You’ve probably heard of html or xml… (Bonus points for knowing what the letters stand for!!!). Well, markdown is a well established simple text layout language, which has been built upon to create a language which is tailored for use with R.
It was first developed in 2012 by Yihui Xie - you can find a lot more detail on the language in his book which outlines how you can author documents such as books or dissertations in RMarkdown. Xie now works for RStudio and has developed an amazingly versatile and extensible system, which spans production of all forms of communication, from websites (blogdown) and slides to dissertations and journal articles.
The best way to learn it is to give yourself a project to do with it. I wrote an MSc Thesis in it, and while it was a challenge, it very much suited the research, as I wanted to achieve a reproducible analysis workflow - and I got to learn a new skill…!
My dissertation
It is fully integrated with RStudio, with options to create all forms of RMarkdown media, buttons to action the various commands needed to “knit” a markdown document, and allowing customisation of the type of output you want. While RMarkdown is a native web format, and using it to knit a html document means you can exploit all the power of web (javascript based) packages such as leaflet and plotly, it is also possible to output to pdf and even Microsoft Word formats.
One thing I would highly advise is to print out and laminate the handy cheatsheets available here. They’re invaluable and should be on the desk of any R user.
For the purpose of exposing some of the features and convenience of RMarkdown for reproducible analysis, we will look at a recently released dataset - the Valuation Office released as open data information relating to commercial property for several counties around Ireland. You can learn more about this here.
We will import the data directly as a csv table from the api - the query string is obtained via the query engine on the Valutaion Office website. You can specify several different parameters such as the county, what fields you want to drawdown and what categories of business you are interested in. I have specified all the data for Dublin City Council.
if (!file.exists("data/valoff_dcc.csv")) {
valoff_dcc <- read_csv("https://api.valoff.ie/api/Property/GetProperties?Fields=*&LocalAuthority=DUBLIN%20CITY%20COUNCIL&CategorySelected=OFFICE%2CFUEL%2FDEPOT%2CLEISURE%2CINDUSTRIAL%20USES%2CHEALTH%2CHOSPITALITY%2CMINERALS%2CMISCELLANEOUS%2CRETAIL%20(SHOPS)%2CUTILITY%2CRETAIL%20(WAREHOUSE)%2CNO%20CATEGORY%20SELECTED%2CCENTRAL%20VALUATION%20LIST%2CCHECK%20CATEGORY%2CNON-LIST%2CNON-LIST%20EXEMPT&Format=csv&Download=true")
valoff_dcc %>% write_csv("data/valoff_dcc.csv")
} else valoff_dcc <- read_csv("data/valoff_dcc.csv")
## Parsed with column specification:
## cols(
## .default = col_character(),
## `Property Number` = col_double(),
## Valuation = col_double(),
## `Car Park` = col_double(),
## `X ITM` = col_double(),
## `Y ITM` = col_double(),
## Area = col_double(),
## `Nav Per M2` = col_double(),
## Nav = col_double()
## )
## See spec(...) for full column specifications.
valoff_dcc %<>%
mutate(prop_num = as.character(`Property Number`)) %>%
select(-`Property Number`) %>%
select(prop_num, everything())
This results in a table with 49786 rows and 20 columns. It might be helpful to inspect the data.
str(valoff_dcc)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 49786 obs. of 20 variables:
## $ Property Number : num 837643 837643 791577 791577 791577 ...
## $ Publication Date: chr "31/12/2013" "31/12/2013" "31/12/2013" "31/12/2013" ...
## $ County : chr "DUBLIN" "DUBLIN" "DUBLIN" "DUBLIN" ...
## $ Local Authority : chr "DUBLIN CITY COUNCIL" "DUBLIN CITY COUNCIL" "DUBLIN CITY COUNCIL" "DUBLIN CITY COUNCIL" ...
## $ Valuation : num 0 0 89100 89100 89100 89100 89100 89100 89100 89100 ...
## $ Category : chr "CENTRAL VALUATION LIST" "CENTRAL VALUATION LIST" "RETAIL (SHOPS)" "RETAIL (SHOPS)" ...
## $ Uses : chr "NETWORK (TELECOM), TELEPHONE (EXCHANGE)" "NETWORK (TELECOM), TELEPHONE (EXCHANGE)" "CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE" "CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE" ...
## $ Address 1 : chr "12 DENZILLE LANE" "12 DENZILLE LANE" "47-48 Lower Baggot Street" "47-48 Lower Baggot Street" ...
## $ Address 2 : chr "DUBLIN 2" "DUBLIN 2" "Dublin 2" "Dublin 2" ...
## $ Address 3 : chr NA NA NA NA ...
## $ Address 4 : chr NA NA NA NA ...
## $ Address 5 : chr NA NA NA NA ...
## $ Car Park : num 0 0 0 0 0 0 0 0 0 0 ...
## $ X ITM : num 316704 316704 316733 316733 316733 ...
## $ Y ITM : num 233735 233735 233230 233230 233230 ...
## $ Level : chr "1" "-1" "2" "2" ...
## $ Floor Use : chr "OFFICE(S)" "CAR SPACES (NO. OF)" "TOILETS" "STORE" ...
## $ Area : num 220 2 24.7 64.5 33.1 ...
## $ Nav Per M2 : num 0 0 100 100 120 132 175 350 700 1 ...
## $ Nav : num 0 0 2467 6452 3967 ...
## - attr(*, "spec")=
## .. cols(
## .. `Property Number` = col_double(),
## .. `Publication Date` = col_character(),
## .. County = col_character(),
## .. `Local Authority` = col_character(),
## .. Valuation = col_double(),
## .. Category = col_character(),
## .. Uses = col_character(),
## .. `Address 1` = col_character(),
## .. `Address 2` = col_character(),
## .. `Address 3` = col_character(),
## .. `Address 4` = col_character(),
## .. `Address 5` = col_character(),
## .. `Car Park` = col_double(),
## .. `X ITM` = col_double(),
## .. `Y ITM` = col_double(),
## .. Level = col_character(),
## .. `Floor Use` = col_character(),
## .. Area = col_double(),
## .. `Nav Per M2` = col_double(),
## .. Nav = col_double()
## .. )
valoff_dcc %>% inspect_cat()
## # A tibble: 12 x 5
## col_name cnt common common_pcnt levels
## <chr> <int> <chr> <dbl> <list>
## 1 Address 1 16033 SHOP 0.809 <tibble [16,03~
## 2 Address 2 6627 Dublin 2 5.85 <tibble [6,627~
## 3 Address 3 1223 <NA> 31.8 <tibble [1,223~
## 4 Address 4 285 <NA> 78.5 <tibble [285 x~
## 5 Address 5 68 <NA> 95.9 <tibble [68 x ~
## 6 Category 14 RETAIL (SHOPS) 46.7 <tibble [14 x ~
## 7 County 1 DUBLIN 100 <tibble [1 x 3~
## 8 Floor Use 161 OFFICE(S) 33.2 <tibble [161 x~
## 9 Level 49 0 60.9 <tibble [49 x ~
## 10 Local Authori~ 1 DUBLIN CITY COUNCIL 100 <tibble [1 x 3~
## 11 Publication D~ 1 31/12/2013 100 <tibble [1 x 3~
## 12 Uses 792 OFFICE (GEORGIAN / VIC~ 8.62 <tibble [792 x~
So we have a column which serves as a Property Identifier, some columns with geographical info (Local Authority, County, Address, X ITM and Y ITM which are coordinate values) as well as information which describes the category of use (Category and Uses, Floor Use), the area (Area) and valuation attributed to the record (Nav per M2 and Nav).
Lets have a look at the first 20 records… I’m using the knitr::kable() function in conjuction with kableExtra() which allows customisation of table options. This is ideal for html output, but causes some issues with pdf output, if I recall correctly - some styling options are only available for html. A lot more detail on customisation of kableExtra() available here. kableExtra uses applies the twitter bootstrap theme to html tables, with oodles of customisation.
valoff_dcc %>% head(20) %>% kable() %>% kable_styling(bootstrap_options = "striped", full_width = T)
| prop_num | Publication Date | County | Local Authority | Valuation | Category | Uses | Address 1 | Address 2 | Address 3 | Address 4 | Address 5 | Car Park | X ITM | Y ITM | Level | Floor Use | Area | Nav Per M2 | Nav |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 837643 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 0 | CENTRAL VALUATION LIST | NETWORK (TELECOM), TELEPHONE (EXCHANGE) | 12 DENZILLE LANE | DUBLIN 2 | NA | NA | NA | 0 | 316703.8 | 233735.3 | 1 | OFFICE(S) | 220.00 | 0.0 | 0.00 |
| 837643 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 0 | CENTRAL VALUATION LIST | NETWORK (TELECOM), TELEPHONE (EXCHANGE) | 12 DENZILLE LANE | DUBLIN 2 | NA | NA | NA | 0 | 316703.8 | 233735.3 | -1 | CAR SPACES (NO. OF) | 2.00 | 0.0 | 0.00 |
| 791577 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 89100 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | 47-48 Lower Baggot Street | Dublin 2 | NA | NA | NA | 0 | 316732.8 | 233229.5 | 2 | TOILETS | 24.67 | 100.0 | 2467.00 |
| 791577 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 89100 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | 47-48 Lower Baggot Street | Dublin 2 | NA | NA | NA | 0 | 316732.8 | 233229.5 | 2 | STORE | 64.52 | 100.0 | 6452.00 |
| 791577 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 89100 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | 47-48 Lower Baggot Street | Dublin 2 | NA | NA | NA | 0 | 316732.8 | 233229.5 | 1 | STORE | 33.06 | 120.0 | 3967.20 |
| 791577 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 89100 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | 47-48 Lower Baggot Street | Dublin 2 | NA | NA | NA | 0 | 316732.8 | 233229.5 | 1 | RESTAURANT | 101.93 | 132.0 | 13454.76 |
| 791577 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 89100 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | 47-48 Lower Baggot Street | Dublin 2 | NA | NA | NA | 0 | 316732.8 | 233229.5 | 0 | RETAIL ZONE C | 14.35 | 175.0 | 2511.25 |
| 791577 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 89100 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | 47-48 Lower Baggot Street | Dublin 2 | NA | NA | NA | 0 | 316732.8 | 233229.5 | 0 | RETAIL ZONE B | 36.62 | 350.0 | 12817.00 |
| 791577 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 89100 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | 47-48 Lower Baggot Street | Dublin 2 | NA | NA | NA | 0 | 316732.8 | 233229.5 | 0 | RETAIL ZONE A | 53.54 | 700.0 | 37478.00 |
| 791577 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 89100 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | 47-48 Lower Baggot Street | Dublin 2 | NA | NA | NA | 0 | 316732.8 | 233229.5 | 0 | OFF LICENCE | 10000.00 | 1.0 | 10000.00 |
| 2135648 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 71500 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | GROUND FLOOR + BASEMENT | 64 O’CONNELL ST. UPPER | DUBLIN 1 | NA | NA | 0 | 315863.2 | 234710.4 | -1 | STORE | 74.46 | 80.0 | 5956.80 |
| 2135648 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 71500 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | GROUND FLOOR + BASEMENT | 64 O’CONNELL ST. UPPER | DUBLIN 1 | NA | NA | 0 | 315863.2 | 234710.4 | 0 | RETAIL ZONE REMAINDER | 36.34 | 100.0 | 3634.00 |
| 2135648 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 71500 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | GROUND FLOOR + BASEMENT | 64 O’CONNELL ST. UPPER | DUBLIN 1 | NA | NA | 0 | 315863.2 | 234710.4 | 0 | RETAIL ZONE C | 36.95 | 200.0 | 7390.00 |
| 2135648 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 71500 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | GROUND FLOOR + BASEMENT | 64 O’CONNELL ST. UPPER | DUBLIN 1 | NA | NA | 0 | 315863.2 | 234710.4 | 0 | RETAIL ZONE B | 46.03 | 400.0 | 18412.00 |
| 2135648 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 71500 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | GROUND FLOOR + BASEMENT | 64 O’CONNELL ST. UPPER | DUBLIN 1 | NA | NA | 0 | 315863.2 | 234710.4 | 0 | RETAIL ZONE A | 33.53 | 800.0 | 26824.00 |
| 2135648 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 71500 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | GROUND FLOOR + BASEMENT | 64 O’CONNELL ST. UPPER | DUBLIN 1 | NA | NA | 0 | 315863.2 | 234710.4 | 0 | OFF LICENCE | 1.00 | 9330.0 | 9330.00 |
| 716154 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 14970 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | 82 Ballymun Road | Dublin 9 | NA | NA | NA | 0 | 315436.3 | 237962.1 | 0 | RETAIL ZONE A | 36.79 | 300.0 | 11037.00 |
| 716154 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 14970 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | 82 Ballymun Road | Dublin 9 | NA | NA | NA | 0 | 315436.3 | 237962.1 | 0 | RETAIL ZONE B | 12.39 | 150.0 | 1858.50 |
| 716154 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 14970 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | 82 Ballymun Road | Dublin 9 | NA | NA | NA | 0 | 315436.3 | 237962.1 | 0 | STORE | 3.90 | 30.0 | 117.00 |
| 716154 | 31/12/2013 | DUBLIN | DUBLIN CITY COUNCIL | 14970 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE | 82 Ballymun Road | Dublin 9 | NA | NA | NA | 0 | 315436.3 | 237962.1 | 0 | OFF LICENCE | 1.00 | 1963.5 | 1963.50 |
So, as we can see, we have multiple properties per row - can anyone guess what each row might represent?
valoff_dcc %>%
select(prop_num, Valuation, Category, Uses) %>%
distinct(prop_num, .keep_all = TRUE) -> rates_properties # 20306
rates_properties %>%
head(50) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = T)
| prop_num | Valuation | Category | Uses |
|---|---|---|---|
| 837643 | 0 | CENTRAL VALUATION LIST | NETWORK (TELECOM), TELEPHONE (EXCHANGE) |
| 791577 | 89100 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE |
| 2135648 | 71500 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE |
| 716154 | 14970 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE |
| 841510 | 51300 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE |
| 2119783 | 18390 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE |
| 1484301 | 78600 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE |
| 853072 | 34200 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE |
| 2193101 | 91200 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE |
| 853528 | 89500 | RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], OFF-LICENCE |
| 766233 | 29700 | RETAIL (SHOPS) | ETHNIC FOOD SHOP, OFF-LICENCE |
| 766621 | 11020 | RETAIL (SHOPS) | ETHNIC FOOD SHOP, OFF-LICENCE |
| 852714 | 450000 | HOSPITALITY | PUB, OFF-LICENCE |
| 766619 | 40000 | HOSPITALITY | PUB, OFF-LICENCE |
| 818727 | 110000 | HOSPITALITY | PUB, OFF-LICENCE |
| 1031290 | 44200 | HOSPITALITY | PUB, OFF-LICENCE |
| 714197 | 70200 | RETAIL (SHOPS) | SUPERMARKET 1 [200-500 SQ. M.], OFF-LICENCE |
| 688917 | 259000 | RETAIL (SHOPS) | SUPERMARKET 2 [500-2500 SQ. M.], OFF-LICENCE |
| 808950 | 46400 | RETAIL (SHOPS) | SHOP, OFF-LICENCE |
| 814595 | 50900 | RETAIL (SHOPS) | CLOTHES SHOP, CHARITY SHOP |
| 814592 | 34500 | RETAIL (SHOPS) | CLOTHES SHOP, CHARITY SHOP |
| 1030481 | 4960 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 666082 | 3770 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 612387 | 6290 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 600833 | 3370 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 599648 | 5460 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 695162 | 6580 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 714316 | 6240 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 731816 | 3790 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 711012 | 5980 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 819039 | 4660 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 715223 | 2750 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 715995 | 3960 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 719403 | 4060 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 666145 | 4230 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 787968 | 5720 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 5017136 | 12320 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 2147280 | 8060 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 786969 | 6110 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 786967 | 8450 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 715138 | 6240 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 5009508 | 5780 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 785379 | 8290 | LEISURE | SPORTS & LEISURE CENTRE, HAIRDRESSING SALON |
| 715211 | 2720 | OFFICE | OFFICE (OWN DOOR), HAIRDRESSING SALON |
| 856372 | 43900 | RETAIL (SHOPS) | ETHNIC FOOD SHOP, HAIRDRESSING SALON |
| 813050 | 4360 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 2166706 | 6510 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 719415 | 5620 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 594955 | 3260 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
| 666083 | 3670 | OFFICE | OFFICE (OVER THE SHOP), HAIRDRESSING SALON |
There are 20306 different properties listed for Dublin City Council. Let’s see how the categorical variables breakdown…
rates_properties %>%
summarise(average_RV = mean(Valuation),
total_RV = sum(Valuation),
number_of_properties = n()) %>%
kable()%>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = T)
| average_RV | total_RV | number_of_properties |
|---|---|---|
| 63713.82 | 1293772817 | 20306 |
rates_properties %>%
count(Uses, sort = TRUE) %>%
head(50) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = T)
| Uses | n |
|---|---|
| OFFICE (GEORGIAN / VICTORIAN), - | 1683 |
| OFFICE (OVER THE SHOP), - | 1356 |
| OFFICE (3RD GENERATION), - | 1200 |
| WAREHOUSE, - | 1061 |
| WORKSHOP, - | 804 |
| SHOP, - | 712 |
| OFFICE (OWN DOOR), - | 648 |
| RESTAURANT, - | 590 |
| PUB, - | 465 |
| STORE, - | 448 |
| HAIRDRESSING SALON, - | 414 |
| CLOTHES SHOP, - | 355 |
| OFFICE (2ND GENERATION), - | 351 |
| TAKE AWAY, - | 298 |
| OFFICE (BUSINESS PARK), - | 251 |
| OFFICE (HOUSE), - | 231 |
| CAFE, - | 228 |
| PHARMACY, - | 228 |
| OFFICE (HOUSE), OFFICE (GEORGIAN / VICTORIAN) | 202 |
| SHOP (OFFICES), - | 198 |
| OFFICE, - | 193 |
| BETTING SHOP, - | 178 |
| CONVENIENCE STORE [<200 SQ. M.], - | 166 |
| WAREHOUSE, OFFICE | 160 |
| NEWSAGENT, - | 157 |
| OFFICE (3RD GENERATION), | 150 |
| CAR PARK (OFFICE), - | 147 |
| RIGHT OF TRADING, - | 146 |
| ADVERTISING STATION, SHEET 48 | 141 |
| VACANT, - | 136 |
| OFFICE (1ST GENERATION), - | 135 |
| OFFICE (MEWS), - | 123 |
| BEAUTY SALON / MASSAGE, - | 119 |
| DRY CLEANERS / LAUNDERETTE, - | 111 |
| BUTCHER, - | 103 |
| CHECK USE, | 102 |
| OFFICE (INDUSTRIAL), - | 102 |
| CAR PARK (SURFACE), - | 101 |
| OFFICE (GEORGIAN / VICTORIAN), | 99 |
| JEWELLERS, - | 98 |
| HOTEL, - | 96 |
| HARDWARE / DIY, - | 89 |
| BANK, - | 88 |
| OFFICE (HOUSE), SURGERY | 88 |
| OFFICE (OFFICE PARK), - | 81 |
| FACTORY, - | 78 |
| AUCTIONEER, - | 74 |
| COFFEE SHOP, - | 74 |
| KIOSK, - | 73 |
| SHOE SHOP, - | 73 |
rates_properties %>%
count(Category, sort = TRUE) %>%
head(50) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = T)
| Category | n |
|---|---|
| OFFICE | 7660 |
| RETAIL (SHOPS) | 7184 |
| INDUSTRIAL USES | 3332 |
| MISCELLANEOUS | 822 |
| HOSPITALITY | 672 |
| LEISURE | 214 |
| CHECK CATEGORY | 105 |
| RETAIL (WAREHOUSE) | 93 |
| HEALTH | 83 |
| FUEL/DEPOT | 69 |
| NO CATEGORY SELECTED | 44 |
| CENTRAL VALUATION LIST | 13 |
| UTILITY | 12 |
| NON-LIST | 3 |
rates_properties %>%
count(Category, Uses, sort = TRUE) -> category_use_table
category_use_table %>%
head(50) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = T)
| Category | Uses | n |
|---|---|---|
| OFFICE | OFFICE (GEORGIAN / VICTORIAN), - | 1683 |
| OFFICE | OFFICE (OVER THE SHOP), - | 1356 |
| OFFICE | OFFICE (3RD GENERATION), - | 1197 |
| INDUSTRIAL USES | WAREHOUSE, - | 1061 |
| INDUSTRIAL USES | WORKSHOP, - | 804 |
| RETAIL (SHOPS) | SHOP, - | 712 |
| OFFICE | OFFICE (OWN DOOR), - | 648 |
| RETAIL (SHOPS) | RESTAURANT, - | 590 |
| HOSPITALITY | PUB, - | 465 |
| RETAIL (SHOPS) | HAIRDRESSING SALON, - | 414 |
| INDUSTRIAL USES | STORE, - | 364 |
| RETAIL (SHOPS) | CLOTHES SHOP, - | 355 |
| OFFICE | OFFICE (2ND GENERATION), - | 351 |
| RETAIL (SHOPS) | TAKE AWAY, - | 298 |
| OFFICE | OFFICE (BUSINESS PARK), - | 251 |
| OFFICE | OFFICE (HOUSE), - | 231 |
| RETAIL (SHOPS) | CAFE, - | 228 |
| RETAIL (SHOPS) | PHARMACY, - | 228 |
| OFFICE | OFFICE (HOUSE), OFFICE (GEORGIAN / VICTORIAN) | 202 |
| RETAIL (SHOPS) | SHOP (OFFICES), - | 198 |
| RETAIL (SHOPS) | BETTING SHOP, - | 178 |
| RETAIL (SHOPS) | CONVENIENCE STORE [<200 SQ. M.], - | 166 |
| INDUSTRIAL USES | WAREHOUSE, OFFICE | 160 |
| RETAIL (SHOPS) | NEWSAGENT, - | 157 |
| OFFICE | OFFICE (3RD GENERATION), | 150 |
| MISCELLANEOUS | CAR PARK (OFFICE), - | 147 |
| RETAIL (SHOPS) | RIGHT OF TRADING, - | 146 |
| MISCELLANEOUS | ADVERTISING STATION, SHEET 48 | 141 |
| OFFICE | OFFICE (1ST GENERATION), - | 135 |
| INDUSTRIAL USES | OFFICE, - | 129 |
| RETAIL (SHOPS) | VACANT, - | 129 |
| OFFICE | OFFICE (MEWS), - | 123 |
| RETAIL (SHOPS) | BEAUTY SALON / MASSAGE, - | 119 |
| RETAIL (SHOPS) | DRY CLEANERS / LAUNDERETTE, - | 111 |
| RETAIL (SHOPS) | BUTCHER, - | 103 |
| CHECK CATEGORY | CHECK USE, | 102 |
| OFFICE | OFFICE (INDUSTRIAL), - | 102 |
| MISCELLANEOUS | CAR PARK (SURFACE), - | 101 |
| OFFICE | OFFICE (GEORGIAN / VICTORIAN), | 99 |
| RETAIL (SHOPS) | JEWELLERS, - | 98 |
| HOSPITALITY | HOTEL, - | 96 |
| RETAIL (SHOPS) | HARDWARE / DIY, - | 89 |
| OFFICE | OFFICE (HOUSE), SURGERY | 88 |
| RETAIL (SHOPS) | BANK, - | 88 |
| OFFICE | OFFICE (OFFICE PARK), - | 81 |
| INDUSTRIAL USES | FACTORY, - | 78 |
| RETAIL (SHOPS) | STORE, - | 75 |
| RETAIL (SHOPS) | AUCTIONEER, - | 74 |
| RETAIL (SHOPS) | COFFEE SHOP, - | 74 |
| RETAIL (SHOPS) | KIOSK, - | 73 |
As can be seen, the principal category is “Office”, the principal use is “OFFICE (GEORGIAN / VICTORIAN)” which might be expected for the Dublin City Council area.
Let’s investigate whether the valuation of a property might be related to the category of use of the property. In order to do this we will have to control for the overall area of the property, so we will need to add the area accross all rows for the property. Before we do this, there is a bit of data cleaning needed…
* Off-licenses are treated differently - filter out * Car-spaces are treated differently - filter out * there are some negative values - treat these as positive values * remove NA floor values * anything with an area values of 1 / -1 seems to be treated differently - filter out
note that I use the
%<>%assignment pipe operator in the next section - more on this at R for Data Science
valoff_dcc %>%
filter(!(Area == 1.0 | Area == -1.0 | Area == 0)) %>% # remove illogical Area values
filter(!`Floor Use` %in% c("CAR SPACES (NO. OF)")) %>% # remove car space rows
filter(!is.na(`Floor Use`)) %>% # remove NA values in Floor Space
mutate(Area = Mod(Area)) %>%
select(prop_num, Valuation, Area, Category, x = `X ITM`, y = `Y ITM`) -> munged_valoff_areas # convert negative values of Area to positive
# next we sum the area values accross the property with a group process...
munged_valoff_areas %>%
group_by(prop_num) %>%
summarise(mean_val = mean(Valuation),
total_area = sum(Area),
Category = first(Category),
x = mean(x),
y = mean(y),
num_areas = n()) -> val_by_cat_area
# add a column to represent the valuation per area
val_by_cat_area %<>% # note use of assignment pipe
mutate(val_per_area = mean_val / total_area)
The next step is to plot the data. First we’ll do a bog standard ggplot, with a scatter of valuation versus total area, coloured by Category. We need to use log scales for both area and valuation..
val_by_cat_area %>%
filter(mean_val > 0) %>% # remove the few 0 values in mean_val
ggplot(aes(total_area, mean_val)) +
geom_point(aes(color = Category)) +
scale_x_log10(labels = number_format()) + # number format from scales is for nive labels
scale_y_log10(labels = number_format()) +
labs(y = "Valuation",
x = "Total Area")
The beauty of RMarkdown is that you can leverage the full range of interactive html widgets that R supports. One of these is plotly, which has a cool function called ggplotly which allows you to pass in a ggplot object and generate an interactive plot for you. Let’s have a look…
# lets pull back some of the data we had before, such as the address..
val_by_cat_area %>%
left_join(valoff_dcc %>% select(prop_num, c("Address 1", "Address 2", "Address 3"))) -> val_by_cat_area_w_address
## Joining, by = "prop_num"
# put the above dataframe into the ggplot object as above
val_by_cat_area_w_address %>%
mutate(add1 = `Address 1`,
add2 = `Address 2`,
add3 = `Address 3`) %>%
filter(mean_val > 0) %>% # remove the few 0 values in mean_val
sample_frac(0.2) %>% # filter for 20% of the data to reduce load on html parser
ggplot(aes(total_area, mean_val)) +
geom_point(aes(color = Category, Address1 = add1, Address2 = add2, Address3 = add3)) +
scale_x_log10(labels = number_format()) + # number format from scales is for nive labels
scale_y_log10(labels = number_format()) +
labs(y = "Valuation",
x = "Total Area") -> val_area_scatter
## Warning: Ignoring unknown aesthetics: Address1, Address2, Address3
ggplotly(val_area_scatter,
tooltip = c("Address1", "Address2", "Address3"))
As you can see, the ability to incorporate an interactive chart in a document is a very useful feature.
Lets look at mapping the information.